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(^ ■ ABSTRACT 

-)— > 

For a wide class of applications of the Monte Carlo method, we describe a general sampling methodology 

C^ . that is guaranteed to converge to a specified equilibrium distribution function. The method is distinct from 

G ' that of Metropolis in that it is sometimes possible to arrange for unconditional acceptance of trial moves. It 

I , involves sampling states in a local region of phase space with probability equal to, in the first approximation, 

^2 ' the square root of the desired global probability density function. The validity of this choice is derived from 

the Chapman-Kolmogorov equation, and the utility of the method is illustrated by a prototypical numerical 

experiment. 
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C""^ I In Monte Carlo calculations, it is often necessary to sample from a probability density function (PDF) 

that is known only to within a multiplicative constant. That is, we often know an analytic form for 



F(x) = Z/(x), 



where /(x) is the PDF from which we would like to sample, and where the constant Z is unknown. In 
computational statistical physics, for example, it is often necessary to sample from the Boltzmann-Gibbs 
j^ ' distribution 

/(x) = lexp[-/3iJ(x)], (1) 

tt ' where x coordinatizes the phase space, iJ(x) is the system Hamiltonian, and /? is the inverse temperature. 

Q , A simple analytic form is generally available for H{x.) but not for Z. Indeed, if such a form were known for 

Z, it would not be necessary to resort to Monte Carlo methods. 

One way to break this apparent impasse was provided by an ingenious algorithm due to Metropolis et 
^^ ■ al. [y in 1953. This algorithm provides a Markov process on the state space whose time-asymptotic state is 

5-H ' the desired PDF, /(x), but whose transition kernel K{y -^ x) involves only ratios of this PDF. It does this 

by noting that a sufficient condition for a stationary solution of the Chapman-Kolmogorov equation for the 
Markov process, 

/(x)-/dy/(y)i^(y^x), (2) 



is that of detailed balance. 



/(x)/v (x ^ y) = f{y)K{y ^ x). (3) 



To see this, we need only integrate both sides of Eq. (^) with respect to y, using the normalization constraint 
on the transition kernel. 



rfyX(x^y) = l; (4) 

the result is Eq. (H). We then note that the choice 

K{y — !■ x) — ^(y —t x) min 



/(x)T(x^y) 



/(y)r(y ^ x) 



(5) 



satisfies Eq. (||) and involves only ratios of the PDF. 

A simple application of the Metropolis method to a statistical physics problem might then begin with 
state y and select a new state x according to some trial PDF, T(y ^ x). A convenient (but not necessary) 
condition on the trial PDF is 

r(y^x) = r(x^y). (6) 

This new trial state is then accepted with probability 

min{l,exph/3(iJ(x)-ff(y))]}, 

so that the net transition probability is given by Eq. (^. Thus, the trial state is accepted with probability 
unity if it results in a decrease of energy, and with probability exp {—f3AH) if it results in an energy increase 
of AH. If the trial is not accepted, then the current state is retained. More sophisticated implementations 
of the Metropolis algorithm bias the selection of a trial state so that Eq. (m is violated M , but the detailed- 
balance condition, Eq. (||), is maintained. 

In this paper, we describe a fundamentally different kind of Markov process for the sampling of a PDF 
/(x) that is known only to within a multiplicative constant. The method is distinct from that of Metropolis 
in that moves may, in some circumstances, be accepted unconditionally. Nevertheless, it admits /(x) as a 
time-asymptotic solution of the Chapman-Kolmogorov equation, Eq. (||) . The proposed Markov process uses 
a transition rate of the form 

Jdz,T{y^z)g{z) 

where T(y —^ x) is the trial PDF, and o(x) is to be determined in terms of the desired time-asymptotic state 
/(x). Note that this form obeys Eq. (Q) manifestly. Essentially, it tells us to sample from (/(x), normalized 
over only those states accessible by the trial PDF r(y -^ x). We note that the dependence of g(x) on 
/(x) may well be complicated (nonlinear and/or nonlocal), but it must be such that the transition kernel of 
Eq. (0) is unchanged when /(x) is scaled by a multiplicative constant. 

If we insist that the trial process be symmetric in the sense of Eq. (o), then it is straightforward to verify 
that the Chapman-Kolmogorov equation, Eq. ^), with transition kernel given by Eq. (|7|), is satisfied by 

/(x)=5(x)y'dyT(y-.x).g(y). (8) 

Thus we must invert Eq. (m to get g(x) in terms of /(x). Eq. (H) is a nonlinear integral equation for ^(x), 
but the reward for inverting it is a Markov process with specified trial PDF, T(y ^ x), and stationary state 
/(x). The utility of Eq. (ph is the central observation of this paper. 

One limit in which Eq. (^) has a trivial solution is that for which all states are equally accessible. If 
T(y ^ x) is independent of x, it is easy to see that Eq. (ph is solved by g(x) c>c /(x). This solution is the 
heat-bath Monte Carlo algorithm. The proposed method can thus be regarded as a generalization of the 
heat-bath algorithm for nonuniform trial PDF's. 

We can better understand the character of the solutions to Eq. (0) if we rewrite it as follows: 



5(x) 



/(x) 



ydyT(x^y)fM 



(9) 



Let us suppose that giy) is reasonably constant over the set of y for which T(x -^ y) is appreciable. For 
simplicity, let us also suppose that the trial PDF is normalized, 



dyT(y->x) = l. 



Then the integral in the denominator is nearly unity, and ^(x) sa ■\//(x) to first approximation. In this case, 
it makes sense to solve Eq. (M) by the following successive approximation scheme: 

g(x)= limgW(x) 

i—^oo 

g^^\x^) — constant 



/(x) 



Vrfynx-y)fl^ 



The first few approximations to (?(x) are then 



g«(x) = v/7W 



/(x) 



\/dyr(x^y)^^ 



g(3)(x) = 




9^'H^) 



\ 



/(x) 



/dyr(x^y) 



\ 



/(y) / dz T(x^z) 






/(x)/dzT(y-*z) 



(10) 



dw r(x^w). 






In all cases, note that if /(x) is multiplied by a factor C, the ^'^^^(x) scale by an overall factor \/C, but this 
factor cancels when inserted into Eq. (|7|) for the transition kernel. Thus the transition kernel depends only 
on ratios of the PDF, as promised. 

At this point, we make two observations: First, all of the above considerations apply to a discrete state 
space, if we just replace the integrals by sums as appropriate. Second, all of the above considerations can 
be applied to situations for which T(x ^ x) = 0, so that the walker steps to another state with probability 
unity, though there are no guarantees that Eq. (pi) will admit a solution in that case. 

As a first simple example of this methodology for which the integral equation, Eq. (S), may be solved 
without any approximation at all, consider a five-state system with x, y € {1,2,3,4,5}, so that T(x -^ y) 
may be represented as a matrix which we take to be 



T 



Note that the diagonal elements T(x — > x) = so that moves that leave the state unchanged are expressly 
forbidden. The analog of the integral equation, Eq. (pf), is the set of nonlinear algebraic equations 
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fj = 9A9j-i +.9j+i), 

where we have used subscripts modulo 5 to denote functional arguments for simplicity. We note that scaling 
the gj by C and the fj by C^ leaves the system unchanged, and we search for solutions only to within an 
arbitrary multiplicative constant. It may be verified that the following is such a solution: 

93 = C( + fj + fj + l - fj + 2 + fj + 3 - fj + 4,){ + fj ~ fj + 1 + fj + 2 + fj + 3 - fj + i){ + fj - fj + 1 + fj + 2 - fj + 3 + fj + i), (H) 

where again all subscripts are taken modulo 5. The constant C turns out to be equal to 

1 

c = . 

but it is unnecessary to know this in order to use this solution, since only the ratios of the ^j's matter. 

To be concrete, suppose that we take fi — 1, f2 — 2, f^ — 3, f^ — 2, and f^ — I; obviously, these should 
be normalized by the factor 1/9, but let us pretend that we do not know this normalization factor. The 
above equations give 5i = 52 = 34 = ffs = 3C, and (73 = 9C. Thus, if we simulate a Markov process with 
transition matrix 
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the result will be the desired equilibrium state. Note that this Markov process is nothing like the Metropolis 
algorithm, in that trial states are accepted unconditionally. Also note that it is distinct from the heat-bath 
algorithm, wherein all states must sampled with probability proportional to /(x). 

It is of course not always possible to do what we have done in the simple example above. If we have one 
state that is much more probable than any other, our intuition tells us that a Markov process cannot force 
us to leave that state with probability unity every time we are in it, since the result would be our spending 
half of our time in a much less probable state. To see how the solution for the five-state system breaks down 
in that case, let us suppose that state 3 has a high probability f^, and the other four states have a low 
probability J^. We find that gi = g^, = (2/^ - /«)/|, 32 = 54 = (2/i - JhYIh, and 53 = /|. We note that 
sgn((;i) = sgn{gz) and sgn(g2) = sgn(sr3) = sgn(f/4), but that these signs will not be the same if Jh > 2/^. If 
they are not the same, our transition matrix will involve negative probabilities. Hence we arrive at another 
requirement on the solutions of Eq. m): Any legitimate solution for .g(x) must be of definite sign. 

It is one thing to treat a five-state system, but quite another to treat, say, a two-dimensional Ising model 
on a 10 X 10 lattice, for which there are 2^*^° distinct states. In such a situation, it will be no more possible to 
store g{x) on a computer than it is to store /(x). Thus, we must search for ways to obtain partial solutions 
to Eq. m) for large systems on the fly; more specifically, if we are in a state x, we must be able to easily 
compute g{y) for all the states for which T(x — > y) is nonzero. This is enough information to sample the 
next state and then we can repeat the process. So for the remainder of this paper we turn our attention to 
larger systems. 

It is likely that analytic solutions in the spirit of Eqs. (lO) exist for arbitrarily large systems, but they seem 
to become correspondingly more difficult to find. What is interesting is that the iterative process given in 
Eqs. (|lO| ) actually becomes easier to solve for larger systems, as long as the distribution /(x) is smooth in the 
state space. To see this, let us consider an A^-state system with periodic, nearest-neighbor state connectivity 
and 



/. 



1-^2 sin^ 



TTJ 



A^- 1 



for j = 0, . . . ,N — 1; note that when A = 5 this reduces to the numerical example used above for which 
we know that the exact solution for the gj obeys (72/ffi = 94/91 = 35/51 = 1 and 53/51 = 3. (Recall that 
only ratios of the g^-'s matter.) These results are not particularly close to the first approximation of the 

\/2, and 



iterative process of Eq. (g 



which would be 5^^V5i^^ = Vf^/h = 1, 92,1/9?'^ 



VhWi 



af'/gf' 


£ = 1 


£ = 25 


£ = 50 


£ = 75 


£ = 100 


j = 2,4 


1.41421 


1.20022 


1.05966 


1.01876 


1.00600 


j = 3 


1.73205 


2.72733 


2.90754 


2.96972 


2.99019 


TABLE I. Iterative solution with iV = 5 


9T/9['' 


£ = 1 


£ = 2 


£ = 3 


£ = 4 


£ = 100 


j = 10 


1.26302 


1.26408 


1.26378 


1.26361 


1.26261 


i = 20 


1.66176 


1.66450 


1.66410 


1.66388 


1.66301 


j = 30 


1.68466 


1.68747 


1.68707 


1.68685 


1.68598 


j = m 


1.30976 


1.31107 


1.31076 


1.31059 


1.31018 



TABLE II. Iterative solution with iV = 50 



53 /9i — \/ HI H = V3- This is because that iterative process was predicated on the smoothness of /(x), 
and there is no obvious sense in which a function defined on five discrete points is smooth. Nevertheless, as 
Table B illustrates, the iteration does appear to converge. Table EB shows that when we let A^ = 50 (without 
making the function / any more rapidly varying), the iteration converges much faster. (This is especially 
evident if we compare the values of the iteration number i in the two tables.) Moreover, we see that the 
asymptotic result is not very different from the first approximation, which is proportional to the square root 
of the distribution function. This suggests that we develop a perturbation theory for g(x) about ■\//(x), 
and we turn our attention to this approach below. 

Let us restrict our attention to continuous state spaces, and suppose that T(x — > y) = W^djx — y||), for 
some norm || • ||, where the region of support of VF(||y||) is very small compared to the characteristic scale(s) 
of /(x). Then we can Taylor expand ^(y) about the point x in Eq. (ra). In particular, if we then suppose 
that the moments of the function W are 

dyW{ly\\)^\ 

dyw^(l|y||)y = t 



(12) 



y<iyW(l|y||)yy = tt + <T 



where t is the mean and a is the variance tensor, then Eq. (g|) becomes 



/(x) = .9(x) 



.g(x) + 1 • Vg(x) + - (tt + 0-) : VV.g(x) 



= [5(x)]' 



1 



t • V.g(x) (t • V) .g(x) a : VV.g(x) 



2.9(x) 



5(x) 2.9(x) 

We can invert the above equation perturbatively to find 

t • V/ , (t • Vff a : VV/ , a : (V/) (V/) 



= V7 



1- 



4/ 



32/2 



8/ 



16/2 



where all functions and derivatives are understood to be evaluated at x. Once again, it is manifest that the 
transition kernel, Eq. (M), computed from this .g(x) will depend only on ratios of /(x). When the desired 
distribution function /(x) is of Boltzmann-Gibbs form, as in Eq. (nl), the above equation reduces to 



9 = e-^^'^ 



1 + ^t . Vif + ^ (t . VHf + ^a : VVH - ^ct : {VH) (Vff) 



(13) 



This form is potentially useful for Monte Carlo simulations of molecular systems, for which the function 
_ff (x) is known in analytic form and the state space is continuous, so that gradients of i?(x) can also be 



computed in analytic form. It is interesting that the first approximation to ^(x) in this case is y/(x) which 
is also of Boltzniann-Gibbs form, but at twice the desired temperature. 
To apply Eq. (O) to a molecular system with potential energy 



1 ^ 



«j 



and PDF exp[- PV (q)] / Z , we calculate g from Eq. (|T|) (with H 
walker at q = (qi, . . . , c[n) to one at 



V), and consider moves that take the 



q + Aq = (qi + crei, . . . , q^v + (76^), 

where the e^ are spatial unit vectors, sampled with probability (7(q+ Aq). Note that all of the particles must 
move a distance a in this scheme. The sampling of unit vectors e^ may be carried out by the Metropolis or 
any other suitable method, but after that is completed the new walker position is q + Aq with probability 
imity. For sufficiently small a, this method is guaranteed to converge and unconditionally accepts moves. A 
potential pitfall is that this sufhciently small value of a may be very small indeed. 

As a final example, we turn our attention to the Ising model with N spins s and Hamiltonian 



1 ^ 



where Afi denotes the neighborhood of spin i. Let us introduce the operator 9i such that 9iS is the state 
obtained from state s by flipping spin i. We also denote the resulting change in observable F{s) by 

<5,^(s) = ^(^,s)-F(s). 

Let us suppose that our trial PDF can take us from state s to any one of the N states ^^s for i = 1, . . . , iV. 
Once again, this condition expressly forbids the system from remaining in its present state. The analog of 
our integral equation, Eq. (H), is then 



-l3H{s) 



ff(s) 



N N 

J2 9{0^s) = g{s)- Y, [^(s) + 5,g{s) 



[9is)Y 



1 ^ 

-T 



5(s) 



so that the analog of our approximation method, Eq. (nu) is 



5(s) 



-t3His)/2 



1 + ^E 



N 5,g{s) 

1W 



Solving this involves 0{N) work at the very least and will yield N values of g, one for each proposed new 
state 0iS. We can sample from these to choose a site whose spin is then flipped with probability unity. 
As with our five-state problem, this is likely to become impossible when one state is much more probable 
than any of its neighbors, and this is likely to happen for sufficiently low temperature. A more fundamental 
problem is that the method as stated requires 0{N) work to flip one spin (albeit with unit probability). This 
is clearly prohibitive, but it may be possible to drastically improve this situation by more clever choices of 
the trial PDF. In particular, it would be interesting to study the relationship of this method to the various 
cluster algorithms that are known for the Ising model |^. 

In conclusion, the integral equation, Eq. (0), is a very useful way to frame the problem of developing 
a Markov process with specified equilibrium and trial PDF. The method is not a panacea, in that it will 
not always be possible to find a solution for 51 (x) with definite sign. The breakdown of the method when 
g has indefinite sign is reminiscent of the sign problem that is sometimes encountered in quantum Monte 
Carlo simulations y, and it would be interesting to study this relationship further. Moreover, unless a trial 
PDF is chosen very judiciously, the amount of work involved in finding g may mitigate the advantage of 



unconditional acceptance, as with the Ising model example above. In any case, we have shown that such 
a solution may be constructed perturbatively for sufficiently small step sizes in a continuous state space, 
and this observation ought to have immediate application to molecular Monte Carlo problems, as we have 
also shown. In addition, our application of this methodology to stochastic lattice Boltzmann methods is 
presented in one of the references . 
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